LONGER−LENGTH ACOUSTIC UNITS FOR CONTINUOUS SPEECH RECOGNITION (ThuAmPO1)
نویسندگان
چکیده
Recent research on the TIMIT database suggests that longer−length acoustic units are better suited for modelling pronunciation variation and long−term temporal dependencies in speech than traditional phoneme−length units, yielding substantial improvements in recognition accuracy [9]. In this paper, we investigate whether similar improvements can be gained on another database, viz. excerpts from novels in a Dutch library for the blind. We use a hierarchical method that employs a mixture of word−, syllable− and phoneme−length units. Our results show that the approach does increase the word accuracy, but to a lesser extent than expected. The paper discusses possible explanations for the finding.
منابع مشابه
Inference of variable-length acoustic units for continuous speech recognition
In the eld of speech recognition, the patterns assumed to structure the speech material (phonemes, triphones, words...) are de ned a priori according to a linguistic criterion, whereas the recognition criterion is based on an acoustic similarity measure. From this may result a lack of consistency for the recognition units. In this paper, we explore the possibility of a more data-driven approach...
متن کاملSyllable-Length Acoustic Units in Large-Vocabulary Continuous Speech Recognition
Recent research on the TIMIT corpus suggests that longerlength acoustic units are better suited for modelling coarticulation and long-term temporal dependencies in speech than conventional context-dependent phone models. However, the impressive results achieved on TIMIT [1] are yet to be reproduced on other corpora, such as read speech from the Spoken Dutch Corpus. Differences between TIMIT and...
متن کاملSplit-lexicon based hierarchical recognition of speech using syllable and word level acoustic units
Most speech recognition systems, especially LVCSR, use context dependent phones as the basic acoustic unit for recognition. The primary motive for this is the relative ease with which phone based systems can be trained robustly with small amounts of data. However as recent research indicates, significant improvements in recognition accuracy can be gained by using acoustic units of longer durati...
متن کاملInference of variable-length linguistic and acoustic units by multigrams
The efficiency of pattern recognition algorithms is highly conditioned to a proper definition of the patterns assumed to structure the data. The multigram model provides a statistical tool to retrieve sequential variable-length regularities within streams of data. In this paper, we present a general formulation of the model, applicable to single or multiple parallel strings of data having eithe...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کامل